16 research outputs found

    Kernel-based independence tests for causal structure learning on functional data

    Full text link
    Measurements of systems taken along a continuous functional dimension, such as time or space, are ubiquitous in many fields, from the physical and biological sciences to economics and engineering.Such measurements can be viewed as realisations of an underlying smooth process sampled over the continuum. However, traditional methods for independence testing and causal learning are not directly applicable to such data, as they do not take into account the dependence along the functional dimension. By using specifically designed kernels, we introduce statistical tests for bivariate, joint, and conditional independence for functional variables. Our method not only extends the applicability to functional data of the HSIC and its d-variate version (d-HSIC), but also allows us to introduce a test for conditional independence by defining a novel statistic for the CPT based on the HSCIC, with optimised regularisation strength estimated through an evaluation rejection rate. Our empirical results of the size and power of these tests on synthetic functional data show good performance, and we then exemplify their application to several constraint- and regression-based causal structure learning problems, including both synthetic examples and real socio-economic data

    Kernel-based Joint Independence Tests for Multivariate Stationary and Non-stationary Time Series

    Full text link
    Multivariate time series data that capture the temporal evolution of interconnected systems are ubiquitous in diverse areas. Understanding the complex relationships and potential dependencies among co-observed variables is crucial for the accurate statistical modelling and analysis of such systems. Here, we introduce kernel-based statistical tests of joint independence in multivariate time series by extending the dd-variable Hilbert-Schmidt independence criterion (dHSIC) to encompass both stationary and non-stationary processes, thus allowing broader real-world applications. By leveraging resampling techniques tailored for both single- and multiple-realisation time series, we show how the method robustly uncovers significant higher-order dependencies in synthetic examples, including frequency mixing data and logic gates, as well as real-world climate and socioeconomic data. Our method adds to the mathematical toolbox for the analysis of multivariate time series and can aid in uncovering high-order interactions in data.Comment: 15 pages, 7 figure

    Kernel Two-Sample and Independence Tests for Non-Stationary Random Processes

    Get PDF
    Two-sample and independence tests with the kernel-based MMD and HSIC have shown remarkable results on i.i.d. data and stationary random processes. However, these statistics are not directly applicable to non-stationary random processes, a prevalent form of data in many scientific disciplines. In this work, we extend the application of MMD and HSIC to non-stationary settings by assuming access to independent realisations of the underlying random process. These realisations - in the form of non-stationary time-series measured on the same temporal grid - can then be viewed as i.i.d. samples from a multivariate probability distribution, to which MMD and HSIC can be applied. We further show how to choose suitable kernels over these high-dimensional spaces by maximising the estimated test power with respect to the kernel hyper-parameters. In experiments on synthetic data, we demonstrate superior performance of our proposed approaches in terms of test power when compared to current state-of-the-art functional or multivariate two-sample and independence tests. Finally, we employ our methods on a real socio-economic dataset as an example application

    Non-linear interlinkages and key objectives amongst the Paris Agreement and the Sustainable Development Goals

    Get PDF
    The United Nations' ambitions to combat climate change and prosper human development are manifested in the Paris Agreement and the Sustainable Development Goals (SDGs), respectively. These are inherently inter-linked as progress towards some of these objectives may accelerate or hinder progress towards others. We investigate how these two agendas influence each other by defining networks of 18 nodes, consisting of the 17 SDGs and climate change, for various groupings of countries. We compute a non-linear measure of conditional dependence, the partial distance correlation, given any subset of the remaining 16 variables. These correlations are treated as weights on edges, and weighted eigenvector centralities are calculated to determine the most important nodes. We find that SDG 6, clean water and sanitation, and SDG 4, quality education, are most central across nearly all groupings of countries. In developing regions, SDG 17, partnerships for the goals, is strongly connected to the progress of other objectives in the two agendas whilst, somewhat surprisingly, SDG 8, decent work and economic growth, is not as important in terms of eigenvector centrality

    Kernel Two-Sample and Independence Tests for Nonstationary Random Processes

    No full text
    Two-sample and independence tests with the kernel-based mmd and hsic have shown remarkable results on i.i.d. data and stationary random processes. However, these statistics are not directly applicable to nonstationary random processes, a prevalent form of data in many scientific disciplines. In this work, we extend the application of mmd and hsic to nonstationary settings by assuming access to independent realisations of the underlying random process. These realisations—in the form of nonstationary time-series measured on the same temporal grid—can then be viewed as i.i.d. samples from a multivariate probability distribution, to which mmd and hsic can be applied. We further show how to choose suitable kernels over these high-dimensional spaces by maximising the estimated test power with respect to the kernel hyperparameters. In experiments on synthetic data, we demonstrate superior performance of our proposed approaches in terms of test power when compared to current state-of-the-art functional or multivariate two-sample and independence tests. Finally, we employ our methods on a real socioeconomic dataset as an example application
    corecore